Mathematics Of Machine Learning
Mathematics Of Machine Learning
Jump to year
- Paper 1, Section II, J - Let be a family of functions with . Define the shattering coefficient and the dimension of . - Briefly explain why if and , then . - Prove that if is a vector space of functions with and we define - then . - Let be the set of all spheres in . Suppose . Show that - Hint: Consider the class of functions , where 
- Paper 2, Section II, J - (a) What is meant by the subdifferential of a convex function at ? Write down the subdifferential of the function given by , where . - Show that minimises if and only if . - What does it mean for a function to be strictly convex? Show that any minimiser of a strictly convex function must be unique. - (b) Suppose we have input-output pairs with . Consider the objective function - where and . Assume that . Fix and define - where for . Show that if , then - [You may use any results from the course without proof, other than those whose proof is asked for directly.] 
- Paper 4, Section II, J - Let be a dataset of input-output pairs lying in for . Describe the random-forest algorithm as applied to using decision trees to produce a fitted regression function . [You need not explain in detail the construction of decision trees, but should describe any modifications specific to the random-forest algorithm.] - Briefly explain why for each and , we have . - State the bounded-differences inequality. - Treating as deterministic, show that with probability at least , - where . - Hint: Treat each as a random variable taking values in an appropriate space (of functions), and consider a function satisfying 
- Paper 1, Section II, J - (a) Let be i.i.d. random elements taking values in a set and let be a class of functions . Define the Rademacher complexity . Write down an inequality relating the Rademacher complexity and - State the bounded differences inequality. - (b) Now given i.i.d. input-output pairs consider performing empirical risk minimisation with misclassification loss over the class of classifiers . Denote by the empirical risk minimiser [which you may assume exists]. For any classifier , let be its misclassification risk and suppose this is minimised over by . Prove that with probability at least , - for , where is a class of functions related to that you should specify. - (c) Let for . Define the empirical Rademacher complexity . Show that with probability at least , 
- Paper 2, Section II, J - (a) Let be a family of functions . What does it mean for to be shattered by ? Define the shattering coefficient and the dimension of - Let - and set . Compute . - (b) State the Sauer-Shelah lemma. - (c) Let be families of functions with finite VC dimension . Now suppose is shattered by . Show that - Conclude that for , - [You may use without proof the fact that if with and , then for .] - (d) Now let be the collection of subsets of of the form of a product of intervals , where exactly of the are of the form for and the remaining intervals are . Set . Show that when , 
- Paper 4, Section II, J - Suppose we have input-output pairs . Consider the empirical risk minimisation problem with hypothesis class - where is a non-empty closed convex subset of , and logistic loss - for and . - (i) Show that the objective function of the optimisation problem is convex. - (ii) Let denote the projection of onto . Describe the procedure of stochastic gradient descent (SGD) for minimisation of above, giving explicit forms for any gradients used in the algorithm. - (iii) Suppose minimises over . Suppose and . Prove that the output of iterations of the SGD algorithm with some fixed step size (which you should specify), satisfies - (iv) Now suppose that the step size at iteration is for each . Show that, writing for the th iterate of SGD, we have - where - [You may use standard properties of convex functions and projections onto closed convex sets without proof provided you state them.]